Development and validation of machine learning models to predict MDRO colonization or infection on ICU admission by using electronic health record data

Background Multidrug-resistant organisms (MDRO) pose a significant threat to public health. Intensive Care Units (ICU), characterized by the extensive use of antimicrobial agents and a high prevalence of bacterial resistance, are hotspots for MDRO proliferation. Timely identification of patients at high risk for MDRO can aid in curbing transmission, enhancing patient outcomes, and maintaining the cleanliness of the ICU environment. This study focused on developing a machine learning (ML) model to identify patients at risk of MDRO during the initial phase of their ICU stay. Methods Utilizing patient data from the First Medical Center of the People’s Liberation Army General Hospital (PLAGH-ICU) and the Medical Information Mart for Intensive Care (MIMIC-IV), the study analyzed variables within 24 h of ICU admission. Machine learning algorithms were applied to these datasets, emphasizing the early detection of MDRO colonization or infection. Model efficacy was evaluated by the area under the receiver operating characteristics curve (AUROC), alongside internal and external validation sets. Results The study evaluated 3,536 patients in PLAGH-ICU and 34,923 in MIMIC-IV, revealing MDRO prevalence of 11.96% and 8.81%, respectively. Significant differences in ICU and hospital stays, along with mortality rates, were observed between MDRO positive and negative patients. In the temporal validation, the PLAGH-ICU model achieved an AUROC of 0.786 [0.748, 0.825], while the MIMIC-IV model reached 0.744 [0.723, 0.766]. External validation demonstrated reduced model performance across different datasets. Key predictors included biochemical markers and the duration of pre-ICU hospital stay. Conclusions The ML models developed in this study demonstrated their capability in early identification of MDRO risks in ICU patients. Continuous refinement and validation in varied clinical contexts remain essential for future applications. Supplementary Information The online version contains supplementary material available at 10.1186/s13756-024-01428-y.

Methods Utilizing patient data from the First Medical Center of the People's Liberation Army General Hospital (PLAGH-ICU) and the Medical Information Mart for Intensive Care (MIMIC-IV), the study analyzed variables within 24 h of ICU admission.Machine learning algorithms were applied to these datasets, emphasizing the early detection of MDRO colonization or infection.Model efficacy was evaluated by the area under the receiver operating characteristics curve (AUROC), alongside internal and external validation sets.

Results
The study evaluated 3,536 patients in PLAGH-ICU and 34,923 in MIMIC-IV, revealing MDRO prevalence of 11.96% and 8.81%, respectively.Significant differences in ICU and hospital stays, along with mortality rates, were observed between MDRO positive and negative patients.In the temporal validation, the PLAGH-ICU model achieved an AUROC of 0.786 [0.748, 0.825], while the MIMIC-IV model reached 0.744 [0.723, 0.766].External validation demonstrated reduced model performance across different datasets.Key predictors included biochemical markers and the duration of pre-ICU hospital stay.

Development and validation of machine learning models to predict MDRO colonization or infection on ICU admission by using electronic health record data
Yun Li 1,2 † , Yuan Cao 1,2 † , Min Wang 1,2 † , Lu Wang 1,2 , Yiqi Wu 1,2 , Yuan Fang 1,2 , Yan Zhao 2 , Yong Fan 3 , Xiaoli Liu 3 , Hong Liang 3 , Mengmeng Yang 2 , Rui Yuan 1,2 , Feihu Zhou 2 , Zhengbo Zhang 3* and Hongjun Kang 2* Background Antimicrobial resistance constitutes a major threat to public health [1].Bacteria that are resistant to three or more classes of antimicrobial agents are typically categorized as multidrug-resistant organisms (MDRO).International experts collaboratively established an interim standard for defining MDRO in 2012, targeting five prevalent bacterial species: Staphylococcus aureus, Enterococcus spp., Enterobacteriaceae, Pseudomonas aeruginosa, and Acinetobacter spp., and meticulously specified the antimicrobial categories for defining multidrug resistance in these bacteria [2].The proliferation of MDRO infections contributes to a rise in the misuse of antimicrobials, heightens the likelihood of adverse drug events, extends the duration of hospitalization, and increases the mortality rates among patients [3].Intensive Care Units (ICU), characterized by extensive antimicrobial use and high bacterial resistance rates, are prominent areas for the prevalence of MDRO infections [4,5].
Promptly identifying patients at elevated risk for MDRO colonization or infection is beneficial for curtailing the dissemination of MDRO and bettering the patients' prognosis [6].During the early phase of ICU admission, it is common for healthcare providers to test body fluid samples to establish an infection diagnosis.Nevertheless, the commonly employed techniques for microbial culture and drug sensitivity testing in hospitals around the globe are protracted, with the process from sample delivery to report retrieval usually spanning several days [7].Methods proposed by Gupta et al. [8] to curb MDRO transmission and infection involve increasing laboratory test accuracy and the active cultivation of specimens from patients with potential infections.However, this strategy requires substantial medical resources [9], and during the Coronavirus Disease 2019 (COVID-19) pandemic, there were reports of hospitals interrupting MDRO screening and monitoring due to shortages in manpower and financial resources [10].Hence, analyzing from a medical resource optimization standpoint, focused surveillance supersedes broad-based monitoring.The development of an MDRO alert system, employing particular technological methods to intensively monitor high-risk patients, is crucial for diminishing the development and transmission of resistant bacteria and for better resource allocation in healthcare.
Machine learning (ML) has become increasingly prevalent in disease prediction models, demonstrating notable success in performance.Compared with logistic regression, ML can effectively deal with complex linear and nonlinear relationships between variables in a data set, which can greatly improve the prediction performance of diseases [11].There are also very few studies that have cross-validated multidrug-resistant bacteria prediction models from different countries.Therefore, in this research, we propose developing a predictive model based on ML, utilizing data obtained early during a patient's ICU stay.This model is intended to early detect patients with colonization or infection by MDRO, thereby decreasing MDRO proliferation and, to a certain degree, supporting empirical pharmacotherapy.

Study population and definitions
This study encompasses two datasets, one derived from the ICU of the First Medical Center of the People's Liberation Army General Hospital (PLAGH-ICU) with patient data spanning from January 2008 to January 2019, and the other from the Medical Information Mart for Intensive Care (MIMIC-IV version 2.2) database.The MIMIC-IV database provides comprehensive clinical information on patients admitted to the ICU at Beth Israel Deaconess Medical Center in the United States between 2008 and 2019 [12].Permission to use the data was obtained for MIMIC-IV databases (No.49,639,059).Given the de-indentified nature of the data, informed consent was waived.The datasets included information of patients who underwent microbial culture within 24 h of ICU admission.Patients under the age of 18 or those with an ICU stay shorter than 24 h were excluded.Patients detected with MDRO within 14 days prior to ICU admission were excluded (as these patients typically receive heightened clinical attention), and patients who reported a positive MDRO within 1 day of ICU admission were also excluded (eFigure 1).The data from 2008 to 2016 were used for model training, and the data from 2017 to 2019 for model validation, henceforth referred to as the training and temporal validation sets [13,14], respectively.Due to the anonymization process in the MIMIC-IV database, which limits the exact admission year to a three-year interval, data that could not be distinctly classified as pre-or post-2017 were not included in the training or temporal validation sets.
In accordance with international expert recommendations, bacteria resistant to three or more classes of antimicrobials were labeled as MDRO, primarily encompassing multiple drug-resistant strains of Staphylococcus aureus, Enterococcus spp., Enterobacteriaceae, Pseudomonas aeruginosa, and Acinetobacter spp.[2].Furthermore, per these recommendations, methicillin resistant staphylococcus aureus (MRSA) was directly categorized as an MDRO.In the definition of multidrug resistance, inherent natural resistance to a particular antimicrobial agent was not considered in determining resistance status for that agent.

Data extraction
Data were extracted for variables accessible within a 24-hour window preceding and succeeding patient admission to the ICU.These variables encompassed: (a) patient demographic data; (b) comorbidity profiles; (c) the latest laboratory test outcomes and vital sign measurements recorded immediately before and after ICU entry; (d) duration of hospitalization prior to ICU admission; (e) total count of hospital and ICU admissions; (f ) duration of antimicrobial and immunosuppressant medication usage preceding ICU admission; and (g) any instances of MDRO detection within a 90-day timeframe.The MIMIC-IV database, providing extensive patient medical histories unavailable in PLAGH-ICU, was utilized for this specific data extraction.
Specimens gathered within the initial 48 h of ICU admission were tested for MDRO colonization or infection.Key outcomes like duration of ICU and hospital stays, and in-hospital mortality, were also recorded.We excluded variables with missing data exceeding 30%, and cases with over 20% missing lab test values [15,16].In the PLAGH-ICU and MIMIC-IV original datasets, missing data were addressed using Multivariate Imputation by Chained Equations (MICE) [17].Following the completion of MICE imputation, each original dataset yielded five complete datasets, from which we selected one for modeling and validation.

Model development and validation
Feature selection and model training were independently executed within the PLAGH-ICU and MIMIC-IV datasets.The process commenced with Spearman's rank correlation for stratified clustering, isolating features without significant collinearity [18].When two variables were found to be collinear, we typically retained one of them based on clinical relevance and input from clinical experts.These were designated as candidate features.A Random Forest algorithm then fitted a model incorporating all candidates, and permutation feature importance ranking [19] was employed to distill features for final model input.Considering the reduction of model performance loss and ease of use in clinical settings, we ultimately included the top 25 features for prediction.Diverse algorithms, including Logistic Regression (LR) [20], K-Nearest Neighbor (KNN) [21], Support Vector Classifier (SVC) [22], Random Forest (RF) [23], eXtreme Gradient Boosting (XGBoost) [24], and Multilayer Perceptron (MLP) [25], were utilized for model construction.Before training the LR, KNN, SVC, and MLP models, the dataset underwent min-max normalization.
For hyperparameter optimization, Bayesian optimization [26] in conjunction with a 5-fold cross-validation approach was employed within the training set.Post hyperparameter tuning, models were trained using training set data, followed by performance evaluation in the temporal validation set.A stacking methodology [27] was utilized to amalgamate the four most efficacious models, creating a robust ensemble model, which underwent further validation.In addition, to assess the robustness of the imputation and its potential impact on the results, we performed a sensitivity analysis by applying the developed model to the temporal validation set after removing cases with missing data.
In the final phase, leveraging variables common to both PLAGH-ICU and MIMIC-IV databases, models were re-trained using the 15 most common features and fine-tuned in one database and subjected to external validation in the other.All predictive model development processes in this study were compliant with the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) principles [13].

Statistical analysis
Continuous variables with deviations from a normal distribution in the baseline characteristics were quantified using the median and interquartile ranges to illustrate the central trend and distribution of the data.Categorical variables were presented as counts and percentages.Model classification efficacy was appraised by constructing receiver operating characteristic curves (ROC) and computing the area under these curves (AUROC).Decision curve analysis [28] and probability calibration curves [29] provided additional performance insights.The Shapley Additive Explanations (SHAP) method [30] was employed to ascertain the impact of variables on model output.The Kruskal-Wallis test was utilized for assessing differences in non-normally distributed or heteroscedastic datasets, while chi-square tests were used for rate or proportion comparisons, considering p-values below 0.05 as indicative of statistical significance.Python (3.9.16) and R (4.2.3) were the tools for machine learning modeling and statistical analyses.Python was primarily utilized for data preprocessing, feature engineering, and construction and evaluation of machine learning models (scikit-learn, pandas, numpy and shap).R is mainly used for statistical analysis, visualization, and partial data processing (dplyr, ggplot2 and pROC).

Baseline characteristics
The PLAGH-ICU dataset encompassed

Model evaluation
The PLAGH-ICU-based ensemble models demonstrated optimal performance in the temporal validation set, recording an AUROC of 0.786 [0.748, 0.825].In the MIMIC-IV models, the ensemble model achieved an AUROC of 0.744 [0.723, 0.766].ROC curves for these models are presented in Fig. 1.In terms of AUROC, Random Forest, XGBoost, and Ensemble methods outperformed other algorithms.Decision curve analysis, depicted in eFigure 3, revealed that in both datasets, the ensemble model outperformed others in lower high-risk threshold ranges, offering higher standardized net benefits.Calibration analysis of ensemble models developed from both datasets was conducted, focusing on Brier scores and the calibration curve metrics (Fig. 2).The PLAGH-ICU model recorded a Brier score of 0.1023 [0.0880, 0.1165], reflecting a higher predictive error, with its calibration curve intercept at 0.3308 [0.1519, 0.5096] and slope at 1.4766 [1.1950, 1.

Interpretability
Feature importance in LR, RF, XGBoost, and MLP models is visually represented in radar charts (Fig. 3), highlighting notable differences in feature prioritization among these models.SHAP analysis (Fig. 4) elucidates the impact of individual variables on the random forest models.In the PLAGH-ICU context, biochemical

Discussion
In this research, predictive models for early ICU MDRO colonization or infection were formulated utilizing the most significant 25 features from the PLAGH-ICU and MIMIC-IV datasets.These models reached AUROC of 0.786 and 0.744 in temporal validation, aligning with the acceptable accuracy standard cited in [31].Excluding data that necessitate clinical input or are not routinely collected, these models are easily adaptable to hospital Electronic Health Record (EHR) systems.By analyzing the SHAP values, it is possible to identify the features that have the greatest impact on the predicted outcome, and also to reveal the complex nonlinear relationships between these features and the predicted outcomes.This interpretive analysis not only enhances the transparency of the model, but also provides valuable insights for clinical decision making.Clinicians can incorporate this information into their decision-making process to more accurately identify high-risk patients and adjust management strategies accordingly.In addition, individual patient SHAP analysis demonstrates how each characteristic affects their individual predicted outcomes.This personalized interpretation can help clinicians understand a specific patient's unique risk profile, leading to a tailored treatment plan.Their application could assist clinicians in swiftly determining MDRO colonization or infection in new ICU patients, likely enhancing empirical antimicrobial usage and mitigating MDRO proliferation.Future implementations will focus on deploying these models in clinical settings with real-time data integration, facilitated by developing interfaces compatible with clinical EHR systems.The models will be incorporated into a clinical decision support system (CDSS) to deliver timely alerts and recommendations.The implementation will address challenges such as data access, privacy concerns, hardware and software integration, physician adoption, and regulatory compliance by engaging key stakeholders including clinicians, technology developers, and regulatory bodies.Additionally, a monitoring framework will be established to continuously assess and enhance the model's performance in clinical environments, ensuring that it remains effective and relevant.
In the temporal validation, the performance of the RF, XGBoost and ensemble models, as measured by AUROC, surpassed that of LR, KNN, SVC, and MLP.This superiority may be attributed to the fact that RF, XGBoost, and ensemble models are all ensemble methods, which have certain advantages in handling complex data.By integrating the predictive outcome of multiple models, these methods enhance the stability and accuracy of the model [23,32,33].Models derived from PLAGH-ICU and MIMIC-IV data showcase varied feature preferences.In the PLAGH-ICU models, the focus is on laboratory values and vital signs, with CRP, PCT, and IL-6 as primary indicators.Conversely, MIMIC-IV models prioritize pre-ICU information, including hospitalization count and recent MDRO detection, along with lab values like RDW, BUN, and MCHC.The variation in data completeness, particularly the higher absence of certain PLAGH-ICU indicators in the MIMIC-IV dataset, led to their exclusion in modeling.This limitation prevented evaluating these indicators' effectiveness across both databases.The noticeable drop in model performance during external validation, attributed to differences in pathogen epidemiology and medical practices, underscores the potential benefit of developing unit-specific MDRO early warning models.Similar issues have been observed in other studies [34].These factors underscore the significant challenges in creating predictive models with strong generalizability that can be applied across different institutions.Nevertheless, this remains a worthwhile endeavor.Collecting data from various hospitals and regions to construct universal predictive factors could potentially enhance the generalizability of these models.
This study employed only the initial EHR data from ICU admissions for model construction, confronting the inherent challenge of limited feature-target correlations, a common hurdle in MDRO prediction models typically struggling to attain high accuracy.Earlier research identifies key risk factors for MDRO infection, including age, immunodeficiency, invasive procedures, recent antibiotic use, repeated or prolonged hospitalizations, and prior MDRO colonization or infection [35,36].Yi Li et al. created a prediction model for carbapenem-resistant Klebsiella pneumoniae infection using data from three central Chinese hospitals' ICUs, validated on three other hospitals' data.They identified prior year colonization or infection, a CD4/CD8 ratio below 1, and over 48 h of parenteral nutrition as independent risk factors, achieving an AUROC of 0.844 in external validation [37].Li Wang et al. conducted a retrospective analysis of 336 ICU patients from the First Affiliated Hospital of Xiamen University, identifying increased Pitt bacteremia scores (PBS), male gender, and elevated CRP levels as independent risk factors in their logistic regression model, with an external validation AUROC of 0.77 [38].Wang et al. employed data from 688 ICU patients, utilizing Lasso and stepwise regression to extract nine independent MDRO infection risk factors for a backpropagation neural network (BPNN) model, validated externally with an AUC of 0.811 [39].Jiang et al. analyzed data from 297 neuro ICU patients, finding tracheal intubation, arterial blood pressure monitoring, fever, antibiotic use, and pneumonia as independent MDRO infection risk factors through binary logistic regression [40].While these studies highlight relevant risk factors, implementing these models directly in hospitals poses challenges.This difficulty is partly due to the lack of published code and model parameters in many studies, as well as the inability to directly apply models developed elsewhere to local hospital settings.This issue is exemplified by the significant performance drop observed when models developed in databases from different countries were validated against each other.Compared to previous studies, this study utilized multicenter data to establish a predictive model, featuring a larger volume of dataset and a more comprehensive set of features.This approach allows for the analysis of MDRO prediction models in various research contexts and provides valuable references for constructing models suitable for different institutions.
In aligning the predictive modeling with real-world clinical scenarios, this study's approach extends beyond merely identifying MDRO infection, encompassing both MDRO colonization and infection in the positive group and including non-MDRO positive cultures and negative cultures in the negative group.This strategy likely accounts for the notable difference in feature selection compared to other studies.Under these parameters, traditional lab markers for infection might have reduced predictive effectiveness, while metrics indicative of immune compromise or systemic weakness might emerge as more predictive.In the MIMIC-IV dataset, the duration of antimicrobial usage, though considered, did not feature prominently, likely reflecting the short median pre-ICU hospitalization duration (0.1 day).Elevated BUN and creatinine levels, often associated with renal function and nutritional status [41], appeared to increase the likelihood of MDRO positivity in the model.This could be attributed to renal impairment in severe infections or underlying renal conditions leading to malnutrition or weakened immunity.SHAP analysis also suggests that lower BUN and creatinine levels correlate with higher MDRO positivity, potentially indicating compromised nutritional and immune status, as evidenced by diminished muscle metabolism (reflected in low creatinine levels), hindering the clearance of MDRO.
The models indicate that elevated liver function markers, specifically gamma-glutamyl transferase (GGT) and bilirubin, heighten the likelihood of MDRO positivity, likely due to their impact on immune and nutritional status [42,43].RDW, a measure of red blood cell size variability, traditionally linked to anemia [44], emerges as a significant predictor in our model.Elevated RDW levels can reflect a state of inflammation, where erythropoietindriven erythropoiesis maintains hemoglobin levels until anemia eventually occurs.This fluctuation in red blood cell production causes size variations, hence the implication of RDW as an inflammation marker in our study.Moreover, the correlation between high RDW levels and nutritional deficiencies may further substantiate its predictive efficacy [45].While elevated white blood cell count (WBC), CRP, and PCT are conventional bacterial infection indicators, the model's varied reliance on these markers-particularly the underutilization of WBC in PLAGH-ICU and its lesser emphasis in MIMIC-IVunderscores their variable nature influenced by factors like age, immune status, and medication [46].Notably, in PLAGH-ICU, shorter pre-ICU hospital stays surprisingly correlated with increased MDRO risk, possibly reflecting a higher likelihood of resistant bacteria carriage among patients transferred from other hospitals after prolonged treatment.This observation could also be linked to community-acquired MDRO prevalence, necessitating further research.
This study has several limitations.First, it is a retrospective study using electronic health record data, and prospective validation of the model is needed to truly assess its impact on improving clinical practice.Second, although the model developed using PLAGH-ICU data has acceptable classification capabilities, its precision, specifically the positive predictive value, is not high, particularly at probability thresholds ensuring higher recall rates, which might entail high costs if applied clinically at this stage.Third, as previously mentioned, models built using data from specific institutions might not be directly applicable in other medical facilities; incorporating data from different units could be necessary to develop models with strong generalizability.Additionally, although we examined the feature importance across different models, it only indicates the correlation between variables and model predictions, not causality, and caution is needed when interpreting and applying these features.

Conclusion
Employing machine learning algorithms, this study developed models for predicting MDRO colonization or infection with data from MIMIC-IV and PLAGH-ICU.These models are instrumental in early identification of patients at high risk of MDRO colonization or infection upon ICU admission, a crucial step in managing antibiotic resistance and optimizing antimicrobial therapy.The models trained on ICU data from diverse geographic regions showed significant variance in feature selection and performance.This underscores the practicality of medical institutions using their own data to train models while integrating insights from broader research.Future endeavors should concentrate on refining the predictive efficacy of MDRO models and assessing their real-world applicability.
7582], showing significant calibration deviation.In contrast, the MIMIC-IV model, with a Brier score of 0.0554 [0.0517, 0.0592], demonstrated lower error.Its calibration curve featured an intercept of -0.6217 [-0.7160, -0.5274] and a slope of 1.085 [0.9755, 1.1947], indicating a smaller deviation from ideal calibration compared to the PLAGH-ICU model.In sensitivity analysis, the model's performance remained stable in the temporal validation set after removing cases with missing data (eFigure 4).The external validation involved assessing the PLAGH-ICU model on the MIMIC-IV dataset and vice versa.This process resulted in a reduction in model performance for both datasets.The PLAGH-ICU model reached a peak AUROC of 0.638 [0.628, 0.648] on MIMIC-IV, and the MIMIC-IV model attained an AUROC of 0.615 [0.585, 0.646] on PLAGH-ICU.The ROC curves of the external validation model are shown in eFigure 5.

Table 2
Baseline characteristics of MIMIC-IV patients